The Graduate School SEMI - SUPERVISED CLUSTERING FOR HIGH - DIMENSIONAL AND SPARSE FEATURES

نویسندگان

Dongwon Lee

Carleen Maitland

چکیده

Clustering is one of the most common data mining tasks, used frequently for data organization and analysis in various application domains. Traditional machine learning approaches to clustering are fully automated and unsupervised where class labels are unknown a priori. In real application domains, however, some “weak” form of side information about the domain or data sets can be often available or derivable. In particular, information in the form of instance-level pairwise constraints is general and is relatively easy to derive. The problem with traditional clustering techniques is that they cannot benefit from side information even when available. I study the problem of semi-supervised clustering, which aims to partition a set of unlabeled data items into coherent groups given a collection of constraints. Because semi-supervised clustering promises higher quality with little extra human effort, it is of great interest both in theory and in practice. Semi-supervised clustering shares a difficulty with a large number of other learning methods in data mining literature. That is, they lose their algorithmic effectiveness for high dimensional data. I focus on data with high-dimensional sparse features and present a series of novel semi-supervised clustering approaches that are

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-supervised Hierarchical Clustering Analysis for High Dimensional Data

In many data mining tasks, there is a large supply of unlabeled data but limited labeled data since it is expensive generated. Therefore, a number of semi-supervised clustering algorithms have been proposed, but few of them are specially designed for high dimensional data. High dimensionality is a difficult challenge for clustering analysis due to the inherent sparse distribution, and most of p...

متن کامل

Pairwise Constrained Clustering for Sparse and High Dimensional Feature Spaces

Clustering high dimensional data with sparse features is challenging because pairwise distances between data items are not informative in high dimensional space. To address this challenge, we propose two novel semi-supervised clustering methods that incorporate prior knowledge in the form of pairwise cluster membership constraints. In particular, we project high-dimensional data onto a much red...

متن کامل

Hyperspectral Image Classification Based on the Fusion of the Features Generated by Sparse Representation Methods, Linear and Non-linear Transformations

The ability of recording the high resolution spectral signature of earth surface would be the most important feature of hyperspectral sensors. On the other hand, classification of hyperspectral imagery is known as one of the methods to extracting information from these remote sensing data sources. Despite the high potential of hyperspectral images in the information content point of view, there...

متن کامل

Sparse Modeling of High - Dimensional Data for Learning and Vision

Sparse representations account for most or all of the information of a signal by a linear combination of a few elementary signals called atoms, and have increasingly become recognized as providing high performance for applications as diverse as noise reduction, compression, inpainting, compressive sensing, pattern classification, and blind source separation. In this dissertation, we learn the s...

متن کامل

Fused Feature Representation Discovery for High-Dimensional and Sparse Data

The automatic discovery of a significant low-dimensional feature representation from a given data set is a fundamental problem in machine learning. This paper focuses specifically on the development of the feature representation discovery methods appropriate for high-dimensional and sparse data. We formulate our feature representation discovery problem as a variant of the semi-supervised learni...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

The Graduate School SEMI - SUPERVISED CLUSTERING FOR HIGH - DIMENSIONAL AND SPARSE FEATURES

نویسندگان

چکیده

منابع مشابه

Semi-supervised Hierarchical Clustering Analysis for High Dimensional Data

Pairwise Constrained Clustering for Sparse and High Dimensional Feature Spaces

Hyperspectral Image Classification Based on the Fusion of the Features Generated by Sparse Representation Methods, Linear and Non-linear Transformations

Sparse Modeling of High - Dimensional Data for Learning and Vision

Fused Feature Representation Discovery for High-Dimensional and Sparse Data

عنوان ژورنال:

اشتراک گذاری